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Background/Aims 

Symptom reflux association analysis is especially helpful for evaluation and management of proton pump inhibitor (PPI) re- 
fractory patients. An accurate calculation requires manual editing of 24-hour multichannel intraluminal impedance-pH (MII-pH) 
tracings after automatic analysis. Intra- and inter-observer agreement as well as reliability of rapid editing confined to the time 
around symptomatic episodes are unknown. Aim of this study was to explore these topics in a prospective multicenter study. 

Methods 

Forty consecutive patients who were off PPI therapy underwent MII-pH recordings. After automatic analysis, their tracings were 
anonymized and randomized. Three expehenced observers, each one trained in a different European center, independently per- 
formed manual editing of 24-hour tracings on 2 separate occasions. Values of symptom index and symptom association proba- 
bility for acid and non acid reflux were transformed into binary response (i.e., positive or negative). 

Results 

Intra-observer agreement on symptom reflux association was 92.5% to 100.0% for acid and 85.0% to 97.5% for non-acid 
reflux. Inter-observer agreement was 100.0% for acid and 82.5% to 95.0% for non-acid reflux. Values for symptom index and 
symptom association probability were similar. Concordance between 24-hour and rapid (2 minutes-window before each symp- 
tomatic episode) editings for symptom reflux association occured in 39 to 40 patients (acid) and in 37 to 40 (non-acid), de- 
pending on the observer. 

Conclusions 

Intra- and inter-observer agreement in classifying patients with or without symptom reflux association at manual editing of 
24-hour tracings was high, especially for acid reflux. Classifying patients according to a rapid editing showed excellent con- 
cordance with the 24-hour one and can be adopted in clinical practice. 
(J Neurogastroenterol Motil 2014;20:205-211) 
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Introduction 

Esophageal 24-hour multichannel intraluminal impedance- 
pH monitoring (MII-pH) is currently considered the gold stand- 
ard for evaluation of gastro-esophageal reflux disease (GERD). 
Its advantage over traditional pH-monitoring is the ability to de- 
tect weakly acidic reflux episodes in addition to acid reflux and al- 
so to differentiate among Rquid, gaseous and mixed liquid/gas- 
eous refluxes.' Observations with this technique have shown the 
clinical relevance of weakly acidic reflux especially in patients 
poorly responsive to proton pump inhibitors (PPI).^ * 

Whereas pH-monitoring analysis is automatic and very quick, 
analysis of MII-pH tracings is much more time consuming be- 
cause it needs manual revision of tracing after the automatic anal- 
ysis, especially because events other than reflux are included 
among reflux episodes by the software. Automatic analysis partic- 
ularly overestimates the number of non-acid reflux events result- 
ing in a lower sensitivity and specificity of a positive symptom in- 
dex (SI) compared to visual analysis.' Moreover a low baseline 
impedance, which may be observed especially in presence of ero- 
sive esophagitis or Barrett's esophagus*'^ makes the analysis more 
difficult and the mistakes easier to occur. 

Information resulting from MII-pH is important especially 
in patients refractory to PPIs because it guides medical treatment 
and may suggest usefulness of anti-reflux surgery. Both a quanti- 
tative (i.e., number of reflux episodes) and a qualitative analysis 
(i.e., symptom reflux association) should be performed, the latter 
analysis having a higher relevance in PPI refractory patients who 
frequently have a normal number of reflux episodes. ^'^ Studies on 
intra- and inter- observer agreement of manual analysis are scan- 
ty, small, referred to the paediatric population or to healthy 
adults,* " and they have focused on number of reflux episodes only. 

In clinical practice physicians often concentrate their editing 
in the time window around symptomatic episodes in order to save 
time, however there are so far no data on rehabihty of such a parti- 
al, quick analysis of MII-pH tracings. 

Aims of this study were to evaluate: (1) agreement within and 
between 3 experienced observers trained in different European 
Centers for presence/absence of symptom reflux association ac- 
cording to currentiy used indexes and for detection of individual 
reflux episodes and (2) concordance between the traditional 24- 
hour manual analysis and a quicker one for presence/absence of 
symptom reflux association. 



Materials and Methods 

Patient Population 

Between September 2011 and January 2012 forty consecutive 
patients off PPI therapy with typical (i.e., heartburn and regur- 
gitation) and/or atypical (i.e., chest pain) esophageal or extra- 
esophageal (i.e., cough) symptoms possibly related to GERD, who 
have undergone 24-hour MII-pH in 2 Centers in Northern Italy 
(Milan and Verona) and have reported symptomatic episodes 
during the test, were prospectively enrolled. Each center has pro- 
vided 20 MII-pH tracings. The study protocol has been ap- 
proved by the Ethics Committees of both hospitals. 

Impedance-pH Equipment 

Esophageal MII-pH monitoring was performed using a 
MII-pH catheter (Z61A; Medical Measurement Systems, 
Enschede, The Netherlands) containing one distal antimony pH 
electrode and eight impedance electrode rings at 2, 4, 6, 8, 10, 14, 
16 and 18 cm from the tip of the catheter. Each pair of adjacent 
electrodes represents an impedance-measuring segment (2 cm in 
length) corresponding to one recording channel. The eight im- 
pedance and pH signals were recorded at 50 Hz on a 128 MB 
Compact Flash Card. Data were stored in a portable receiver with 
impedance amplifier (Medical Measurement Systems). 

Study Protocol 

After an overnight fast, patients attended the Upper Gastro- 
intestinal (GI) Physiology Unit of both Centers. Patient's medi- 
cal history was collected and informed consent was signed. The 
lower esophageal sphincter (LES) was located by esophageal 
manometry and the MII-pH catheter was passed trans-nasally 
under topical anaesthesia and positioned with the pH electrode 5 
cm above the upper border of the LES. During a MII-pH mon- 
itoring, patients were asked to report timing of meals and periods 
spent in recumbent position on a daily diary card; when a symp- 
tom occurred patients were asked to push a botton on the portable 
receiver and to report the exact time on the diary card. When 
many symptoms were reported, only the principal symptom was 
taken into account. During the recording period patients were al- 
lowed to have a free diet, except for known acidic food and bev- 
erages, and to continue their usual daily activities. Patients re- 
turned to the Upper GI Physiology Unit on the following morn- 
ing for catheter removal. 
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All tracings were read twice by the three observers, each one 
trained in a different Center (London, MUan and Verona). The 
three observers were all experienced in the analysis of MII-pH 
tracings, having previously analyzed at least 200 tracings and per- 
forming at least 70 MII-pH/year. The second reading of each 
tracing was performed at least 12 weeks after the first one. 

Data Analysis 

Data stored on the Compact Flash Card were downloaded 
into a personal computer. Markers of meal periods and of timing 
in recumbent position were manually inserted. Data were ana- 
lysed by using an automated reflux detection algorithm (Medical 
Measurement Systems) and meal periods were excluded from the 
analysis. Original tracings were anonimyzed and numbered from 
01 to 20 (provided by MUan) and from 41 to 60 (provided by 
Verona) for inter-observer agreement analysis. These tracings 
were subsequently duplicated and numbered in a radomized or- 
der from 21 to 40 and from 61 to 80 for intra-observer agreement 
analysis. Each tracing was named adding a code identifying the 
Center in order to distinguish those reviewed by each observer 
(observer 1 from Milan, observer 2 from Verona and observer 3 
from London). In order to identify tracings difficult to analyse, 
baseline impedance of each tracing was measured before manual 
analysis. Baseline esophageal impedance was assessed as a mean 
baseline at the two most distal impedance channels (situated at 3 
and 5 cm above the LES), considering a 5-minutes window peri- 
od during the night. Baseline impedance was considered low if < 
500 Q.'"'" The traditional 24-hour manual analysis was performed 
as follows. Each observer went through every reflux episode and, 
when he/she did not agree with the automatic analysis, the reflux 
episode was erased; furthermore reflux episodes not recognized 
by the automatic analysis were added. In order to avoid a possible 
bias due to variability of a further analysis, the rapid analysis was 
obtained by checking for changes that each observer had made 
during hisAier 24-hour analysis in the 2-minute window period 
preceding each symptom and copying them in a separate auto- 
mated analysis file. 

Definitions 

Reflux episodes 

Only liquid and mixed Uquid-gas reflux episodes according 
to impedance changes were included in the analysis. These reflux 
episodes were classified by pH drop nadir in: (1) acid reflux: im- 
pedance-detected reflux event with a nadir pH less than 4, (2) 



weakly acid reflux: impedance-detected reflux event with a nadir 
pH between 4 and 6, and (3) weakly alkaline reflux: impedance- 
detected reflux event with a nadir pH above 7 . ' As weakly alkaline 
refluxes are very infrequent, in the analysis they were merged 
with weakly acidic refluxes and considered as non-acid reflux. 
Total number of reflux episodes was considered pathological 
when > 75/24 hours. 

Symptom-reflux association 

SI and symptom association probability (SAP) were auto- 
matically calculated by the software in each patient. Only the as- 
sociation between the principal symptom reported by the patient 
and acid and non-acid reflux was reported. SI and SAP were de- 
fined according to Wiener et al'* and Weusten et al,'* respectively. 
SI was considered positive when > 50% and SAP when > 95%. 

Statistical Methods 

Intra-observer repeatability and inter-observer reproduci- 
bility of esophageal 24-hour pH-impedance analysis were ex- 
pressed as percentage of agreement and as Cohen's kappa statistic. 
These measures were applied to agreement on positive/negative 
symptom-reflux association and on individual reflux episodes. A 
kappa equal to 0 meant that the agreement was no better than that 
expected by chance alone, and a kappa values equal to 1 indicated 
perfect agreement. 



Table 1. Variables of the 40 Multichannel Intraluminal 24-hour 
Impedance-pH Tracings as Assessed by the 3 Observers 







Observer 1 
(Milan) 


Observer 2 
(Verona) 


Observer 3 
(London) 


AC reflux episodes 




25 (1-90) 


22 (0-83) 


25 (1-91) 


(median [range]) 










NA reflux episodes 




19 (2-89) 


8 (1-76) 


21 (2-99) 


(median [range]) 










Total reflux episodes 




44 (5-99) 


28 (1-89) 


49 (4-106) 


(median [range]) 










Positive SI for AC (n 


[%]) 


5(12.5) 


5 (12.5) 


5(12.5) 


Positive SI for NA (n 


[%]) 


2 (5.0) 


0 (0.0) 


1 (2.5) 


Positive SAP for AC ( 


n[%]) 


6(15.0) 


5(12.5) 


6 (15.0) 


Positive SAP for NA( 


n[%]) 


8 (20.0) 


2 (5.0) 


5(12.5) 



AC, acid; NA, non acid; SI, symptom index; SAP, symptom association 
probability. 
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Table 2. Kappa Values With Standard Error Between the First and 
the Second Analysis, With Regards to Symptom Index and 
Symptom Association Probability Divided Into Acid/Non acid 



Table 3. Kappa Values With Standard Error Between Observers, 
With Regards to Symptom Index and Symptom Association 
Probabihty Divided Into Acid/Non acid 





SI 


SAP 




SI 


SAP 


Acid Non add 


Add 


Non add 


Add Non add 


Add Non add 


Observer 1 


0.77 (0.15) 0.65 (0.31) 


0.80 (0.13) 


0.48 (0.18) 


Observer 1 


1.00 (0.00) 


1.00 (0.00) 0.35 (0.19) 


Observer 2 


0.77(0.15) 


0.72 (0.15) 


0.79 (0.20) 


vs. 






Observer 3 


1.00 (0.00) 


1.00 (0.00) 


0.54(0.20) 


Observer 2 







''Kappa coefficient could not be calculated because all results were negative in the 
second analysis. 

SI, symptom index; SAP, symptom association probability. 
Data are presented as kappa coefficient (standard error). 



1.00 (0.00) 0.36 (0.19) 



Observer 1 □ Observer 2 n Observers 
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Figure 1. Percentage of concordance between the first and the second 
analysis, with regards to symptom index (SI) and symptom association 
probability (SAP) divided into acid (AC) and non-acid (NA) reflux. 



Results 

Patients Characteristics 

Twenty-three of the 40 enrolled patients were women and the 
median age was 55 years (range, 27 to 88 years). All patients 
completed the study and the recording period was more than 23 
hours in all of them. Seventeen (42.5%), 7 (17.5%) and 16 pa- 
tients (40%) experienced typical symptoms, chest pain and 
cough, respectively. All patients had a previous upper GI endos- 
copy showing grade A erosive esophagitis in 5/40 patients 
(12.5%). No Barrett's esophagus was detected. No patients had a 
low esophageal impedance baseline with the median value being 
2487 Q (range, 662-5548 Q). Table 1 shows variables of the 40 
MII-pH tracings as assessed by the 3 observers. The total num- 



1.00 (0.00) 0.23 (0.23) 



vs. 

Observer 3 

Observer 2 1.00 (0.00) 
vs. 

Observer 3 

^Kappa coefficient could not be calculated because all results were negative for 
observer 1 and 2. 

SI, symptom index; SAP, symptom association probability. 
Data are presented as kappa coefficient (standard error). 
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□ Observer 1 vs observer 3 
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Figure 2. Percentage of concordance between observers, with regards 
to symptom index (SI) and symptom association probability (SAP) 
divided into acid (AC) and non-acid (NA) reflux. 

ber of reflux episodes was more than 75 in 5/40 (12.5%), 2/40 
(5%) and 5/40 (12.5%) tracings for observer 1, 2 and 3, respec- 
tively. 

Agreement for Symptom/Reflux Association 

Intra-observer (see Table 2 and Fig. 1) 

Agreement between the first and the second analysis was gen- 
erally high for all 3 observers and slightly better for acid com- 
pared to non-acid refluxes. 



208 



Journal of Neurogastroenterology and Motility 



MII-pH: Agreement and Rapid Analysis 



Inter-observer (see Table 3 and Fig. 2). 

Agreement between the 3 observers was generally good, al- 
though it was higher for acid refluxes compared to non-acid ones. 

Agreement for Detection of Individual Reflux 
Episodes 

Intra-observer 

Median intra-observer agreement between first and second 
analysis was 98.2% (range, 92.4-99.6%) for acid episodes with a 
median kappa coefficient of 0.68 whereas it was 92.3% (range, 
82.1-92.4%) for non-acid episodes with a median kappa co- 
efficient of 0.40. Intra-observer agreement forjudging a study 
normal or pathological on the basis of the number of reflux epi- 
sodes was almost perfect for all the observers, as the number of 
studies with a pathological number of reflux episodes remained 
the same for observer 2 and increased from 5 to 6/40 for observer 
1 and 3 in the second analysis. 

Inter-observer 

Median inter-observer agreement was 86.8% (range, 86.3- 
97.6%) for acid episodes with a median kappa coefficient of 0.22. 
Median agreement was lower for non-acid episodes, 55.7% (range, 
48.9-81.5%) with a median kappa of 0.19. 

Symptom Reflux Association: Concordance 
Between 24-hour and a Rapid Analysis 

Rapid editing showed to be highly predictive of the traditional 
24-hour one for all 3 observers with regards to the four symptom 
reflux association variables (Table 4). In particular rapid and tra- 
ditional 24-hour editing showed concordance in 39 (97.5%) to 40 



Table 4. Percentage of Concordance With Confidence Interval 
Between Short and Traditional Analysis 

SI SAP 





Acid 


Non-acid 


Acid 


Non-acid 


Observer 1 


100.0 


100.0 


100.0 


97.5 




(91-100) 


(91-100) 


(91-100) 


(85-100) 


Observer 2 


97.5 


95.0 


97.5 


92.5 




(85-100) 


(82-99) 


(85-100) 


(78-98) 


Observer 3 


97.5 


100.0 


100.0 


100.0 




(85-100) 


(91-100) 


(91-100) 


(91-100) 



SI, symptom index; SAP, symptom association probability. 

Data are presented as percentage of concordance (confidence interval). 



(100.0%) patients for acid reflux and in 37 (92.5%) to 40 (100.0%) 
for non-acid reflux, depending on the observer. 

Discussion 

Results of our study showed that intra- and inter-observer 
agreement for presence or absence of a symptom-reflux associa- 
tion was high, though slightiy lower for non acid (82.0-97.5%) 
than for acid reflux (92.0-100.0%). Furthermore, and more in- 
terestingly from a practical point of view, a rapid analysis of symp- 
tom reflux association confined to the time around symptomatic 
episodes was highly predictive of the analysis performed over 24 
hours. 

Evaluation of symptom/reflux association is the most useful 
variable in the analysis of MII-pH monitoring performed in pa- 
tients referred to specialized Centers, who are frequentiy PPI re- 
sistant and often with normal reflux exposure. When symp- 
tom reflux association is negative both for acid and non-acid re- 
flux in a patient with normal reflux the diagnosis of GERD is 
ruled out and anti-reflux surgery is no longer indicated as a ther- 
apeutic option. Automatic analysis by computer software has a 
rather low reliability for classifying patients as having positive or 
negative symptom reflux association because it overestimates re- 
flux episodes^'**'"' and especially weakly acid reflux,' when com- 
pared to manual analysis. This is why tracings are manually edit- 
ed by a physician after automatic analysis, a routine which opens 
to possible inaccuracies due to intra and inter-observer variability, 
which have never been evaluated. 

Our study is the first one which has focused on this topic and 
has shown good agreement both within each observer and among 
observers. Results of kappa statistics were less satisfactory espe- 
cially regarding non-acid refiux because the vast majority of pa- 
tients had a negative symptom reflux association and it is known 
that the imbalance between the 2 options of a binary response 
weakens this statistical analysis. Furthermore, kappa was not cal- 
culated for non acid reflux on 5 occasions, because all patients 
had a negative symptom reflux association. Our series is similar 
to previous ones, where esophageal symptoms were frequently 
found to be functional." 

Our study investigated also intra- and inter-observer agree- 
ments on number of acid and non-acid reflux episodes. Previous 
studies have addressed this topic in the paediatric*'' and in the 
adult healthy" and GERD population,'" although the 2 latter 
studies looked at inter-observer agreement only. Furthermore in 
the paper by Ravi et al'" analysis was limited to classifying trac- 
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ings with either normal or pathological number of reflux episodes. 
In the report by Loots et al** 10 tracings were analysed and in- 
ter-observer agreement was calculated among 1 0 assessors whereas 
intra-observer agreement was measured among 3 of them. In the 
report by Pilic et al' 24 tracings were analysed for inter-observer 
agreement and 6 for intra-observer agreement between 2 
investigators. In the study by Zerbib et al" 20 tracings off PPIs 
and 1 8 on PPIs obtained from 20 healthy subjects were evaluated 
for agreement between 2 observers. It is not easy to compare pre- 
vious reports with ours because those studies considered all reflux 
episodes together, i.e., without separating acid and non-acid re- 
flux, and data were presented in different formats. Regarding in- 
ter-observer agreement, the 2 observers in the study by Pilic et al' 
had an agreement which varied widely among the 24 tracings, 
from 0 to 98% with a median of 73%, whereas the 2 observers in 
the study by Zerbib et al" had an overall agreement of 84% off 
PPIs and 73% on PPIs. Our results, obtained analyzing 40 trac- 
ings, showed lower inter-observer agreement on number of 
non-acid compared to acid reflux episodes (48.9-81.5% vs. 
86.3-97.6%). This result was presumably mainly related to ob- 
server 2 scoring a lower number of non-acid refluxes, a fact which 
was likely to have also contributed to a lower inter-observer 
agreement on symptom reflux association for non-acid in com- 
parison with acid reflux. Episodes of non-acid reflux are thus 
more challenging to be agreed upon both between software and 
an experienced assessor' and among experienced assessors. 
Reasons for this have not been explored in detail, however ab- 
sence of a clear drop in pH in situations where impedance read- 
ings are more difficult to be interpreted, i.e., unclear flow direc- 
tion, flow after a swallow, possible reflux during low baseline im- 
pedance and patterns containing gas, presumably are important 
variables.'** These observations should stimulate on one hand the 
manufacturers to improve the software used for automatic analy- 
sis and on the other the training Centers to more carefully train 
physicians approaching this clinical field. Consensus meetings 
among experts with the aim to propose patterns to be detected as 
gastroesophageal reflux in MII-pH tracing are welcome in order 
to improve both automatic analysis and inter-observer agreement. 
A lower agreement for number of reflux episodes has been sug- 
gested in technically challenging tracings in children and infants.* 
We have measured baseline impedance in order to classify trac- 
ings of technical difficulty, although others have suggested addi- 
tional variables,'* and found that none was challenging because 
baseline impedance was > 500 Q in all tracings. Our finding is in 
agreement with previous data,*'" the vast majority of our patients 



being endoscopy negative and with low acid exposure. 

A drawback of manual editing after automatic analysis is that 
reviewing the whole tracing is time consuming and a quicker 
analysis conflned to the time around symptomatic episodes has 
been advocated in clinical practice. This is why we think that the 
clinically more relevant information coming from our paper is 
that presence/absence of symptom reflux association determined 
after a rapid analysis of MII-pH recordings, i.e., centered in a 
2-minute time window before symptomatic episodes, was highly 
predictive of results after the complete 24-hour analysis. This was 
true both for acid and non-acid reflux. Our observation thus 
strongly suggests reliabiRty of a quick manual editing, which 
would save physician's time and Health Care System resources. 

In conclusion, results of our prospective multicenter study 
have shown good intra-observer and inter-observer agreement 
for positive/negative symptom reflux association, when MII-pH 
tracings are manually edited after automatic analysis. This related 
both to acid and non-acid reflux, although inter-observer agree- 
ment was lower for non-acid reflux. Furthermore they have pro- 
duced evidence that a rapid manual editing of the automatic anal- 
ysis conflned to a short time window before each symptomatic ep- 
isode is highly reliable and can be adopted in routine clinical 
practice. 
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